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A Proof for P =? NP Problem 
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The P =? NP problem is an important problem in contemporary mathematics and theoretical 
computer science. Many proofs have been proposed to this problem. This paper proposes a 
theoretic proof for P =? NP problem. The central idea of this proof is a recursive definition for 
Turing machine (shortly TM) that accepts the encoding strings of valid TMs. By the definition, an 
infinite sequence of TM is constructed, and it is proven that the sequence includes all valid TMs. 
Based on these TMs, the class D that includes all decidable languages is defined. By proving 
P=D, the result P=NP is proven. 

f^-i' Categories and Subject Descriptors: F.1.3 [Complexity Measures and Classes]: Relations 

among complexity classes; G.2.1 [Combinatorics]: Counting problems 

General Terms: Computation, Language, Algorithm 

Additional Key Words and Phrases: computational complexity, recursion, decidability, countabil- 
<N . ity 

u 
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1. INTRODUCTION 

CJ . The P =? NP problem is to determine whether every language accepted by some 

nondeterministic algorithm in polynomial time is also accepted by some (determin- 
istic) algorithm in polynomial time [Cook 2000]. Since the problem is first posed 
in Gddel's famous 1956 letter to von Neumann, it has been known as one of the 
most important problems in contemporary mathematics and theoretical computer 
science [Sipser 1992]. Godel considered the P =? NP problem as a fmitary analogue 

^j ■ of the Hilbert Entscheidungsproblem: given a mathematical statement, is there a 

. * | procedure that either finds a proof or finds a disproof. 

f—^ ■ The notation P stands for "polynomial time". In 1965, Jack Edmonds [Edmond 

1965] gave an efficient algorithm to solve the Matching problem and suggested 
a formal definition of efficient computation (runs in a number of steps bounded 
by some fixed polynomial of the input size) . The class of problems with efficient 
solutions is later known as P class. The NP class can be roughly cataloged as a 

rS | class of problems that have efficiently verifiable solutions. The notation NP stands 

for nondeterministic polynomial time, since originally NP was defined in terms 
of nondeterministic Turing machines [Goldreich 2004]. Nondeterministic Turing 
machines have more than one possible move from a given configuration. The P =? 
NP problem can be intuitively defined as: whether every algorithmic problem with 
efficiently verifiable solutions have efficiently computable solutions. 

2. PROBLEM DEFINITION 

In this paper, we mainly follows the official definition announced by Clay Math 
Institute in 2000. A Turing machine (shortly TM) M is a 4-tuple (S, T, Q, S) where 
S, r, Q, S are finite nonempty sets. The set £ is a finite alphabet with more than 
one symbol, and the set T be a finite alphabet such that r D S. The state set 
Q contains three special states: initial state (qo), accept state {q a ccept) and reject 
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state (q r eject)- The transition function S satisfies 

5 : (Q - {qaccept, qreject}) X T -4 Q X T X {-1, 1}. 

The interpretation of 6 is that if M is in state q G (Q — {^accept: qreject}) scanning 
the symbol seT then q' G Q is the new state, s' G T is the symbol printed, and 
the tape head moves left (-1) or right (1) one square. Let E* be the set of finite 
strings over E, and let T* be the set of finite strings over T. A configuration of M is 
a string xqy with x, y G T*, y not the empty string, and q <E Q. The interpretation 
of the configuration xqy is that M is in state q scanning the left-most symbol of y, 
with xy on its tape. A configuration xqy is halting if q G }q a ccept, qreject}- For each 
nonhalting configuration C, M has a unique transition configuration C" such that 
C —¥ C . The computation of M on input w G E* is an unique sequence Co, Ci, ... 
of configurations such that Co = Qo^ an d Cj — > C + i. Therefore, M accepts w iff 
the computation is finite and the final configuration contains the state q aC cept- On 
the contrary, M rejects w iff the computation is infinite or the final configuration 
contains the state q re ject- The number of steps, denoted by £m(w), is one less than 
the number of configurations. In terms of TM M = (T,,T,Q,S), the elements of 
the class P are languages over S. Each language L in P is a subset of S*. The 
language accepted by machine M is defined by 

L(M) = {w G S*|M accepts tu}. 

For input size n G N, the worst case run time of M is defined by 

Tm(»j) = maa;{£M(w)|t(; G X™} 

where S™ is the set of all strings over E of length n. We say that M runs in 
polynomial time if there exists k G N such that Tm(ti) < n k + k for all n. Therefore, 
the class P of languages is defined by 

P = {L(M)\M runs in polynomial time}. 

Instead the original nondetcrministic definition of NP, it can be defined in term 
of deterministic TM with a verifying relation. A verifying relation is a binary 
relation R C S* x T* for some finite alphabets S and T. For each verifying relation 
R, an associate language Lr over EUTU {#} is defined by 

Lr = {w#y\R(w,y)} 

where symbol # is not in E and T. We say that R is polynomial-time iff L# G P. 
The class NP of languages is defined by 

NP = {L\\?w3y(w G L and \y\ < \w\ k and R(w,y) is polynomial-time)} 

where |u>| and \y\ denote the lengths of w and y respectively. In other words, a 
language L over E is in NP iff there is k G N and a polynomial-time verifying 
relation R over E* x T* such that |tu| < \y\ k for all w#y G Lr. Finally, with the 
definitions of P and NP, the P =? NP problem is defined by 

P = NP? 
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3. THE CONJECTURE AND APPROACHES TO PROVE IT 

3.1 Conjecture 

There arc two possible answers to the P =? NP question: P = NP and P ^ 
NP. Most complexity theorists believe that P ^ NP[Cook 2000]. Perhaps this 
can be partly explained by two facts: first, no NP problems are exactly solved in 
polynomial time so far; second, many results that look like approaching toP^ NP 
have been achieved by the heavy efforts to prove P / NP. Although the conjecture 
P 7^ NP is still open, the P ^ NP assumption plays an important role in modern 
cryptography [Goldreich 2004]. For instance, the security of the internet, including 
most electronic transactions, rely on the assumption that it is difficult to factor 
integer or to break DES (the Data Encryption Standard). The P = NP conjecture 
may leads to a much different world. In this world, every problem that has an 
efficiently verifiable solution, we can find that solution efficiently as well [Fortnow 
2009]. Nevertheless, the P = NP conjecture have some supporting facts as well. 
Most hard NP problems turn to be easy to solve in most cases. For example, 
Manindra Agrawal, Neeraj Kayal and Nitin Saxena [?] proposed a polynomial time 
algorithm to determinate whether an integer is prime or composite. In fact, there 
are already practical efficient algorithms, which work well in most cases, for the 
NP problems. Take the SSP (Subset Sum Problem) as an example, Lagarias and 
Odlyzko [Lagarias and Odlyzko 1985] discovered a density property for SSP and 
proposed a lattice reduction based method to solve the problem. They proved that 
their method is efficient to solve low density SSP. Wan and Shi [Wan and Shi 2008] 
proposed an enumeration based approach for SSP. They proved that their approach 
is efficient to solve high and median density SSP. The lattice reduction method and 
enumeration method, with many other approaches, have covered the SSP of almost 
the whole density scope in two symmetrical directions. 

3.2 Approaches 

3.2.1 Completeness. The serious work on P =? NP began following the discov- 
ery of NP-completeness by Cook [Cook 1971], Karp [Karp 1972], and Levin [Levin 
1973] in the seventies. The main results in [Woeginger 2003] are that several natural 
problems, including SAT and 3-SAT and subgraph isomorphism, are NP-completc, 
i.e. every problems in NP can be reduced to these NP-complete problems in poly- 
nomial time. A year later Karp [Karp 1972] shown that 20 other natural problems 
are NP-complete, and refined the standard notation P, NP and NP-completeness. 
Independent of Cook and Karp, the notion of "universal search problem", similar 
to NP-complete problem, is proposed by Leven [Levin 1973] and six examples are 
given, including SAT. Their work provides a possible way to prove P=NP: an effi- 
cient algorithm to any NP-complete problem would imply an efficient algorithm to 
every NP problem. Woeginger [Woeginger 2003] gave a survey of exact algorithms 
for NP problems. 

3.2.2 Diagonalization. In 1874 Cantor [Cantor 1874] showed that real numbers 
are more than algebraic numbers using a technique known as diagonalization. It 
was used by Godel in his Incompleteness Theorem, and by Turing in his undecid- 
ability results, and then refined to prove computational complexity lower bounds 
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[Wigderson and Smale 2006] . The diagonalization technique is also used to separate 
NP from P. The main idea is that construct an NP language L so that every single 
polynomial-time algorithm fails to compute L properly on some input. Based on 
diagonalization, many complexity results are obtained, including Hartmanis and 
Stearns's Time Hierarchy Theorem, Stearns, Hartmanis and Lewis's Space Hier- 
archy Theorem, Cook's Nondeterministic Time Hierarchy Theorem and Ladncr's 
Theorem. By the discovery of a feature shared by many similar complexity results, 
Baker, Gill and Solovay [Baker et al. 1975] suggested the notion relativization and 
shown that relativizing arguments do not suffice to resolve the P =? NP question. 

3.2.3 Circuit Complexity. As a model of computation, Boolean circuit is a gen- 
eralization of Boolean formulae and a rough formalization of the familiar " silicon 
chip" . A Boolean circuit is a diagram showing how to derive an output from an 
input by a combination of the basic Boolean operations: OR(\/), AND(A) and 
NOT(->). Because the size of a circuit is the analog of time in algorithms, proving 
lower bounds for circuits implies lower bounds for algorithms. In 1949, Shannon 
defined circuit complexity, including monotone circuit complexity, and proved that 
for almost all Boolean functions / : {0, 1}™ — > {0, 1}, any Boolean circuit needs 
at least 2"/n gates to compute /. Based on circuit complexity, a number of new 
techniques are used to prove circuit lower bounds on natural, restricted classes of 
circuits, including the Random Restriction method of Furst, Saxe, Sipser [Karch- 
mer and Wigderson 1988] and Ajtai [Ajtai 1983] (proving lower bounds on constant 
depth circuits) , the Communication Complexity method of Karchmcr and Wigder- 
son [Furst et al. 1981] (proving lower bounds on monotone formulae). However all 
attempts in this line fall to obtain any nontrivial lower bounds for general circuits. 
Razborov and Rudich [Razborov and Rudich 1994] explain this failure by pointing 
out that any natural proof of a lower bound implies subexponential algorithms for 
inverting every candidate one-way function, thus natural proofs of general circuit 
lower bounds are unlikely. 

3.2.4 Proof Complexity. The concept of proof is a significant property of the 
study of mathematics. While circuit complexity is used to classify functions ac- 
cording to the difficulty of computing them, proof complexity is used to classify 
theorems according to the difficulty of proving them. Just like Boolean circuits, 
proof systems capture the power of reasoning allowed to the prover. By clev- 
erly combing non-relativizing arithmctization with non-naturalizing diagonaliza- 
tion, some proof systems have overcome both the relativization barrier and natural 
proofs barrier. Buhrman, Fortnow, and Thicrauf [Buhrman et al. 1998] first showed 
that MAexp <jt P/poly and proved that their result was non-relativizing by using 
an oracle A such that MA EXP <f_ P /poly. Vinodchandran [Vinodchandran 2005] 
and Aaronson [Aaronson 2006] showed that for every fixed k, the class PP does not 
have circuits of size n k and proved that the result was non-relativizing by giving 
an oracle A such that PP C SIZE (n). Santhanam [Santhanam 2007] improved 
Vinodchandran's result and showed that for every fixed k, the class PromiscMA 
does not have circuits of size n k . Aronson and Wigderson [Aaronson and Wigder- 
son 2009] defined the notion algebrization by introducing the algebraic oracles and 
showed how algebrization captures a new barrier by proving two sets of results. The 
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first set shows that all known results based on arithmetization that fail to relativize 
can be obtained by using algebraic oracles. The second set shows that many basic 
complexity questions, including P =? NP, will require non-algcbrizing techniques. 

3.2.5 Geometric Complexity. Ketan Mulmuley and Milind Sohoni [Mulmuley 
and Sohoni 2001] observed that the problems in many complexity classes can be 
encoded as group actions on vectors in algebraic geometry, and then proposed arith- 
metic circuit as a new model of computation. Given a family of high-dimension 
polygons P n based on group representations on certain algebraic varieties, they 
argue that for each input of size n, if P n contains an integral point, then any cir- 
cuit family for the Hamiltonian path problem must have size at least n log ™, which 
implies P ^ NP. They showed that arithmetic circuit bypasses the barriers of 
rclativization, natural proofs, and algebrization by using a natural abstract strat- 
egy called "flip" and proposed a new barrier called complexity barrier. Mulmuley 
[Mulmuley 2009] pointed out that complexity barrier may be the root difficulty 
of the P =? NP problem by showing the need for the nonelementary techniques 
to tackle complexity barrier. Although two concrete lower bounds, P ^ NC and 
j^P ^ NC, based on the flip are are obtained, the eventual crossing seems still rely 
on the solving of Riemman Hypothesis. 

4. CONCEPTUAL PROOF 

4.1 Decidable Language 

Definition 4.1 Decidable Language. If a language L can be decided by a TM M, 
i.e. M accepts any string w € L and rejects any string w <£ L, we say the language 
L is a decidable language (or L is decidable). 

Definition 4.2 The Class D. The class D is a class of decidable languages. It is 
defined as 

D = {L(M)\M is aTM.} 

Lemma 4.3. For any finite alphabet S, the set of all TMs over £ is countable. 

Proof. For any finite alphabet E, we denote the set of all strings over E by E*. 
For each integer n > 0, the strings of length n are finite, therefore the set E* can 
be counted in such a way: 

Step 0: count all strings of length 0; 
Step 1: count all strings of length 1; 

Step n: count all strings of length n; 

Because each TM has an encoding into a string (M) and E* is countable, the set 
of all TMs is countable. □ 

Lemma 4.4. For any finite alphabet E, the class D over E is countable. 

Proof. Because the set of all TMs over E is countable, the languages L(M) 
over E are countable. From the definition of the class D, we have that the class D 
is countable. □ 
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4.2 Recursion and Turing machine 

Since the set of all TMs is countable, it is natural to find a method to count all 
TMs. The main problem of counting all the TMs is how to distinguish those valid 
encoding strings from all strings over the given alphabet. Based on the theory of 
recursion, we propose a recursive definition for a TM sequence Q to distinguish 
valid TMs and their encoding strings. The TMs in Q take the strings over given 
alphabet as input and accept those strings which are recognized as valid encoding 
strings of TMs. For any finite alphabet S, the TM sequence Q is defined as follows: 

Moo : {return reject.} 

Mio : {for any input string w, if Moo accepts w then 

return accept 
else if w = (Moo) then 

return accept 
else return reject.} 
Mn '■ {for any input string w, if Moo accepts w then 

return accept 
else if w = (Mn) then 

return accept 
else return reject.} 

M(i)(2*j) ■ {for any input string w, if M(;_i)(j) accepts w then 

return accept 
else if w = (^u-i)(j)) then 

return accept 
else return reject.} 
M(i)(2*j+i) ■ {for any input string w, if Mu_\\ij\ accepts w then 

return accept 
else if w = (M/ i w 2 *j+i)) then 

return accept 
else return reject.} 



The figure 1 illustrates the recursive definition of all TMs and organizes the TMs 
in the form of binary tree. 

According to the definition, each TM My in the sequence Q is identified by 
two integers i and j. Integer i is the number of TMs whose encoding strings are 
accepted by My , integer j is the order number of My among all TMs that accepts 
i TM encoding strings. The first TM Moo is the smallest TM that rejects any input 
string, therefore the encoding (Moo) of Moo should has the shortest size among 
all TMs over S. Except Moo, each TM M/j\/ 2 *j) m Q accepts all strings that 
accepted by -&%-i)(j) and the encoding string (M(i_-\\u\) of M(i_-\\u\, and each 
TM M(i)(2*j+i) hi Q accepts all strings accepted by M(i_\\u\ and the encoding 
string (M( i )( 2 *j'+i) of Mu\i2*j+i)- Through this recursive way, each TM accepts no 
more strings than its previous TM and then generate an infinite TM sequence Q 
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Fig. 1. The recursively defined TM sequence in the form of binary tree. 



— (M o,Mio,Mn, ..., M(i)(2*j),M( i )( 2 *j+i), ...). 

In the following section, we will show some properties of Q and then prove that 
P=NP. 

4.3 P=NP 

Lemma 4.5. For any finite alphabet £, the TM sequence Q includes all valid 
TMs over S. 

PROOF. Induction 
Step 1: 

It is obvious that M 00 is the only TM that accepts none TM encoding string. 
Step 2: 

Assume there is a TM M\k, k > 1, which accepts only one TM encoding string (M). 
The TM M is not M 00 or itself M. Therefore, TM M at least accepts one different 
TM M' that is not equal to M . According to the recursive definition, TM Mu- can 
accept the encoding string that is accepted by M. As a result, M±k accepts more 
than one encoding string. It is contradict to the assumption that M\k accepts only 
one encoding string. Therefore, there are no TMs accept only one TM encoding 
string except Miq and Mu in Q. 
Step 3: 

Suppose all TM My that accepts i TM encoding strings are included in the TM 
sequence Q. 
Step 4: 
Assume there is a TM M( i+1 )(^), accepting i + 1 TM encoding strings, that is not 

Preprint Manuscript, Vol. V, No. N, September 2010. 



8 • WAN C.L. 

in the TM sequence Q. Because all TM My that accepts i TM encoding strings is 
included in Q, there are at most one TM whose encoding string (M) is accepted 
by M( i+1 wfc) but not accepted by any My. According to the recursive definition 
of TM in Q, the TM M must not be any My or M( i+1 )j.. Otherwise, Mt^u//.) IS 
included in the TM sequence Q. Therefore, TM M at least accepts one different 
TM M' that is not equal to M. As a result, M( i+1 )( fc ) accepts at least i + 2 TM 
encoding strings. It is contradict to the assumption that M( i+1 ^ k accepts i + 1 TM 
encoding strings. Therefore, there are no TMs accepting i + 1 TM encoding strings 
except the TM M( i+1 )y) in Q. 
Step 5: 

By Induction, we proved that the TM sequence Q includes all valid TMs over E. 
□ 

Lemma 4.6. PCNP. 

Proof. The proof of this lemma is trivial. Given any finite alphabet E that has 
at least one element, for each language L over E, if L G P, then there exists at least 
one polynomial-time verifying relation R C E* U E* such that R(w, y) <=> w G L for 
all w, y G E*. Thus we have P C NP. □ 

Lemma 4.7. PCD. 

Proof. According to the definition of class P, each language L in P can be 
decided by a TM that runs in polynomial time. Thus L is a decidable language, 
and then we have PCD. □ 

Lemma 4.8. Every TM in Q runs in polynomial time. 

PROOF. Induction 
Step 1: 

It is obvious that Moo runs in polynomial time because it always halts in one step. 
Step 2: 

We assume that My runs in polynomial. 
Step 3: 

Because M(j +1 )( 2 ,y) and M( i+1 )( 2 *j+i) always simulates Mij first, if the input string 
w is accepted by M^, M( i+1 )( 2 *j) and M(j +1 )( 2 ,y+i) run like M^ in polynomial 
time. Otherwise, the input string w is rejected by My, and then M( i+1 )( 2 ,y) and 
M(j + i)(2*j+i) run one more step to decide whether w equals to the encoding string 
of Mij or M( i+1 )( 2s y +1 ), i.e. Mk+i runs in polynomial time too. 
Step 4: 

By Induction, we proved that every TM in Q runs in polynomial time. 
□ 

Lemma 4.9. D c P. 

Proof. For each language L(M) in D, there is a TM M in Q that decides 
L. Because alls TM in Q run in polynomial time, we have that L{M) in P and 
DCP. □ 

Theorem 4.10. P=NP 

Proof. Because DCP and P C D, we have that D = P. Suppose P ^ NP, 
because P C NP, there is at least one language L such that L 6 NP and L ^ P. 

Preprint Manuscript, Vol. V, No. N, September 2010. 



A Proof for P =? NP Problem • 9 

Because P = D, we have a contradiction that the language L is not in D. Therefore, 
we proved that P = NP. □ 

5. CONCLUSION 

This paper first give an introduce to the P =? NP problem through a survey of 
related conjecture and approaches to prove it. After a formal definition of decidable 
language, this paper give a recursive definition for Turing machine that accepts the 
encoding strings of valid TM. Based on the definition, an infinite sequence of TM 
is constructed and it is proven that the sequence includes all valid TMs. Based on 
these TMs, the class D that includes all decidable languages is defined. At last the 
result P=NP is proved by proving P=D. 
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